Swamp: an Isometric Frontend for Speaker Clustering
نویسنده
چکیده
In this paper, we describe a non-linear feature normalization based on Riemannian differential geometry. This feature normalization will yield parameters that are invariant under any bijective stationary transformation. Moreover, it is robust to additive noise that is uncorrelated with speech and quasi-stationary. The only requirement is that of ergodicity. The frontend is called SWAMP (Sweeping Metric Parameterization). The frontend assumes that speech resides in a small, smooth manifold that is entirely and densely explored during the course of an utterance. It first observes the tangent spaces on every point of the manifold. This defines a local Riemannian geometry. Under this geometry, we are able to measure geodesic lengths on the manifold. These lengths are invariant under non-linear transformations. Therefore, we are able to locate a point invariantly by measuring its relative distance to all other observed points. Through classical multi-dimensional scaling, we map this triangulation to a canonical, Euclidean, isometric space inherent of the observed manifold. Combined with standard features, SWAMP features are shown to improve speaker clustering on Broadcast News.
منابع مشابه
Analysis of gender normalization using MLP and VTLN features
This paper analyzes the capability of multilayer perceptron frontends to perform speaker normalization. We find the context decision tree to be a very useful tool to assess the speaker normalization power of different frontends. We introduce a gender question into the training of the phonetic context decision tree. After the context clustering the gender specific models are counted. We compare ...
متن کاملBilateral Weighted Fuzzy C-Means Clustering
Nowadays, the Fuzzy C-Means method has become one of the most popular clustering methods based on minimization of a criterion function. However, the performance of this clustering algorithm may be significantly degraded in the presence of noise. This paper presents a robust clustering algorithm called Bilateral Weighted Fuzzy CMeans (BWFCM). We used a new objective function that uses some k...
متن کاملE-HMM approach for learning and adapting sound models for speaker indexing
This paper presents an iterative process for blind speaker indexing based on a HMM. This process detects and adds speakers one after the other to the evolutive HMM (E-HMM). The use of this HMM approach takes advantage of the different components of AMIRAL automatic speaker recognition system (ASR system: frontend processing, learning, loglikelihood ratio computing) from LIA. The proposed soluti...
متن کاملOn the use of perceptual Line Spectral pairs Frequencies and higher-order residual moments for Speaker Identification
Conventional Speaker Identification (SI) systems utilise spectral features like Mel-Frequency Cepstral Coefficients (MFCC) or Perceptual Linear Prediction (PLP) as a frontend module. Line Spectral pairs Frequencies (LSF) are popular alternative representation of Linear Prediction Coefficients (LPC). In this paper, an investigation is carried out to extract LSF from perceptually modified speech....
متن کاملVoting for two speaker segmentation
The process of locating the end points of each speakers voice in an audio file and then clustering segments based in speaker identity is called speaker segmentation. In this paper we present a method for two speaker segmentation, though it can be extended to more than two speakers. Most methods for speaker segmentation and clustering start with an initial computationally inexpensive speaker seg...
متن کامل